Linear Update

You can’t train a neural network on a single sample. Let’s apply n samples of x to the function y = Wx + b , which becomes Y = WX + B .

For every sample of X ( X1 , X2 , X3 ), we get logits for label 1 ( Y1 ) and label 2 ( Y2 ).

In order to add the bias to the product of WX , we had to turn b into a matrix of the same shape. This is a bit unnecessary, since the bias is only two numbers. It should really be a vector.

We can take advantage of an operation called broadcasting used in TensorFlow and Numpy. This operation allows arrays of different dimension to be added or multiplied with each other. For example:

import numpy as np
t = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])
u = np.array([1, 2, 3])
print(t + u)

The code above will print…

[[ 2  4  6]
 [ 5  7  9]
 [ 8 10 12]
 [11 13 15]]

This is because u is the same dimension as the last dimension in t .